A field-theoretic approach to linear scaling ab-initio molecular dynamics 
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We present a field-theoretic method suitable for linear scaling molecular dynamics simulations 
using forces from self-consistent electronic structure calculations. It is based on an exact decompo- 
sition of the grand canonical potential for independent fermions and does neither rely on the ability 
to localize the orbitals nor that the Hamilton operator is well-conditioned. Hence, this scheme 
enables highly accurate all-electron linear scaling calculations even for metallic systems. The inher- 
ent energy drift of Born-Oppenheimer molecular dynamics simulations, arising from an incomplete 
convergence of the self-consistent field cycle, is solved by means of a properly modified Langevin 
equation. The predictive power of this approach is illustrated using the example of liquid methane 
under extreme conditions. 

PACS numbers: 31.15.aq, 31.15.E-, 71.15.Dx, 71.15.Pd 



INTRODUCTION 

Ab-initio molecular dynamics (AIMD), where the 
forces are calculated on-the-fly by accurate electronic 
structure methods, has been very successful in explain- 
ing and predicting a large variety of physical phenom- 
ena and guiding experimental work. However, the in- 
creased accuracy and predictive power of AIMD simu- 
lations comes at a significant computational cost, which 
has limited the attainable length and time scales in spite 
of recent progress [1]. As a consequence, Hartree-Fock 
(HF), density functional theory (DFT) [2] and even the 
semi-empirical tight-binding (TB) approach [3] are to 
date the most commonly used electronic structure meth- 
ods in conjunction with AIMD. However, for large sys- 
tems the calculation of the electronic structure and hence 
total energies as well as nuclear forces of atoms and 
molecules is still computational fairly expensive. This 
is due to the fact that solving the Schrodinger equation 
is a high-dimensional eigenvalue problem whose solution 
requires diagonalizing the Hamiltonian of the correspond- 
ing system, which typically scales cubically with its size. 
Therefore, a method that scales linearly with the size of 
the system would be very desirable, thus making a new 
class of systems accessible to AIMD that were previously 
thought not feasible. For that reason developing such 
methods is an important objective and would have a ma- 
jor impact in scientific areas such as nanotechnology or 
biophysics, just to name a few. 

Several so called linear-scaling methods have been pro- 
posed [4-8] to circumvent the cubic scaling diagonaliza- 
tion that is the main bottleneck of DFT and TB. Un- 
derlying all of these methods is the concept of "near- 
sightedness" [9, 10], an intrinsic system dependent prop- 
erty, which states that at fixed chemical potential the 



electronic density depends just locally on the external 
potential, so that all matrices required to compute the 
Fermi operator will become sparse at last. Together with 
sparse matrix algebra techniques linear scaling in terms 
of memory requirement and computational cost can be 
eventually achieved. However, the crossover point after 
which linear scaling methods become advantageous is still 
rather large, in particular for metallic systems and/or if 
high accuracy is needed. 

Therefore another method, based on the grand- 
canonical potential (GCP) for independent fermions, has 
been recently developed [11, 12]. Krajewski and Par- 
rinello demonstrated that by decomposing the GCP it is 
possible to devise an approximate stochastic linear scal- 
ing scheme [13-15]. Since this approach does not rely on 
the ability to localize the electronic wavefunction, even 
metals can be treated. However, due to its stochastic 
nature extending such a method towards self-consistent 
TB, DFT or HF is far from straightforward. 

This is where we start in this paper. Following previous 
work of Ceriotti, Kiihne and Parrinello [16, 17] we com- 
pute here the finite-temperature density matrix, or Fermi 
matrix, in an efficient, accurate and in particular deter- 
ministic fashion by a novel hybrid approach. Inspired 
by the Fermi operator expansion method pioneered by 
Goedecker and coworkers [4, 18, 19], the Fermi operator 
is described in terms of a Chebyshev polynomial expan- 
sion, but in addition accompanied with fast summation 
as well as iterative inversion techniques. The resulting 
algorithm has several important advantages. As before 
the presented scheme does rely on the ability to localize 
the orbitals, but requires only that the Hamiltonian ma- 
trix is sparse, a substantially weaker requirement. As a 
consequence not only metals, but even systems for which 
the Fermi matrix is not sparse yet can be treated with 
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linear scaling computational effort. Moreover, this ap- 
proach scales sub-linearly with the width of the Hamil- 
tonian spectrum, which is very attractive for all-electron 
calculations or when a high energy resolution is required. 
Since all matrices arising from the decomposition of the 
GCP obey substantially better condition numbers than 
the Hamiltonian matrix [16], our method is particularly 
well suited for highly accurate linear scaling electronic 
structure calculations. The fact that the present scheme 
is based on the GCP inherently entails finite electron 
temperature, which is not only in line with finite tem- 
perature simulations such as AIMD, but furthermore also 
allows for computations of systems with excited electrons 
[20]. In addition, we would like to mention that our al- 
gorithm is intrinsically parallel, since the resulting terms 
resulting from the decomposition of the GCP are inde- 
pendent and can be separately calculated on different 
processors. Beside the method itself, we will show that it 
is indeed possible to perform fully self-consistent AIMD 
simulations and demonstrate our novel scheme on liquid 
methane at planetary pressure and temperature condi- 
tions. 

This article is organized as follows. In section II the 
basic methodology of our method is established, while in 
section III we circumstantiate our novel hybrid approach. 
Section IV is describes the implementation within a self- 
consistent AIMD framework, while section V is devoted 
to the application on liquid methane at high tempera- 
ture and pressure, as well as to the analysis of the actual 
computational complexity with respect to system size. 



BASIC METHODOLOGY 

We begin with the generic expression for the total en- 
ergy E of an effective single particle theory, such as HF, 
DFT or TB 



written as 
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The first term denotes the so-called band-structure en- 
ergy, which is given by the sum of the lowest N dou- 
bly occupied eigenvalues E\ of an arbitrary Hamiltonian 
H. In DFT for instance, H is the Kohn-Sham matrix, 
while Vdc accounts for double counting terms as well as 
for the nuclear Coulomb interaction. In TB and other 
semi-empiricial theories H depends parametrically only 
on the nuclear positions and Vdc is a pairwise additive re- 
pulsion energy. While in either case it is well known how 
to calculate Vd c with linear scaling computational effort, 
the computation of all occupied orbitals by diagonaliza- 
tion requires 0(N 3 ) operations. Due to the fact that 
the band-structure term can be equivalently expressed in 
terms of the density matrix P, the total energy can be 



E = 2 J2 a + Vdc = Tt[PH] + Vdc 



(2) 



As a consequence, the cubic scaling diagonalization of H 
can be bypassed by directly calculating P rather than all 

Si's. 

To that extend, we follow Alavi and coworkers [11, 
12] and consider the following (Helmholtz) free energy 
functional 



T = Q + fiN e + Vdc 



(3) 



where /i the chemical potential, N e = 2N the number of 
electrons and 51 the GCP for noninteracting fermions 
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At this S stand for the overlap matrix, which is equiv- 
alent to the identity matrix I if and only if the orbitals 
are expanded in mutually orthonormal basis functions. 
In the GCP the electronic temperature is finite and given 
by /3 _1 = ksT . However, in the low-temperature limit 
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the band-structure energy can be recovered and 
lim^_ i . 00 T = E holds. We are now factorizing the op- 
erator of Eq. (4) into P terms. Given that P is even, 
which we assume in the following, we can make use of 
the following identity 
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where the matrices Mi, with I = 1, . . . , P are defined by 
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and * denotes complex conjugation. Similar to numer- 
ical path-integral calculations, it is in principle possible 
to exploit the fact that if P is large enough, so that j3/P 
is small, the exponential operator e^p^ s ~ H ^ can be ap- 
proximated amongst others by a Trotter decomposition 
or simply by a high-temperature expansion, i.e. 
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However, as we will see, here no such approximation is Inserting Eq. (10) into Eq. (9) we end up with the fol- 



required. In any case, we can rewrite f2 as 
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lowing field theoretic expression for the GCP: 
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As is customary in lattice gauge field theory [21, p. 17], 
where the minus sign problem is avoided by sampling a 
positive definite distribution, we write the inverse square 
root of the determinant as an integral over a complex field 
4>i, which has the same dimension D as the full Hilbert 
space, i.e. 



where, as already mentioned, D is the dimension of 
Mi Mi and </>/ are appropriate vectors. 

All physical relevant observables can be computed as 
functional derivatives of the GCP with respect to an 
appropriately chosen external parameter. For example, 
N e = —dVL/dfx and lim / 3_ i . 00 fl + ^,N e = 2 Y?,i=i e ii so that 

dfj, 
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Since the functional derivative of the constant in Eq. (11) 
is identical to zero, all physical interesting quantities can 
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Thereby, Eq. (13e) holds because of Montvay and Miin- 
ster [21, p. 18], while Eq. (13f) is due to the fact that 
beside being positive definite MfMi is also symmetric. 

Unlike Eq. (9), the determination of Q = d(/3Sl)/dfi 
does no longer require to calculate the inverse square root 
of a determinant, but only the inverse of Mi. But, since 
the inversion usually has to be performed P times, pre- 
sumably with a rather large prefactor. Nevertheless, as 
we will see later this can be much ameliorated and all 



but very few matrix inversions can be avoided. On the 
other hand, Mi is not only very sparse, since it obeys 
the same sparsity pattern as H, but is furthermore also 
always better conditioned as the same, so that even all 
its inverse matrices are substantially sparser than p and 
thus can be efficiently determined [16, 17]. Solving the 
N e sets of linear equations Mi&j = ipj, where {ipj} is 
a complete set of basis functions, the inverse can be ex- 
actly computed as M' 1 = Y,f=i fij^j within 0{N 2 ) 
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operations. 

Comparing Eq. (2) with Eq. (5) it is easy to see, that 
the GCP and similarly all physical significant observables 
can be written as the trace of a matrix product consist- 
ing of the Fermi matrix p, which in the low-temperature 
limit is equivalent to P. Specifically, f2 = d(/3n)/d/3 = 
Tr[pH] - /j,N e , but because N e = Tr[pS] holds at the 
same time, the former can be simplified to 



n = Tr[p(H - fiS)} 



(14) 



where S = —dH/d^j, and p = dil/dH. As a conse- 
quence, the GCP and all its functional derivatives can be 
reduced to evaluate p based on Eq. (13e) with A = Hij. 
Using the identity 

H| = -± {(M l I)/} + P{Mi ~ 1)} , (15) 

for this particular case Eq. (13e) eventually equals to 
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In other words, the origin of our method is the notion 
that the density matrix, can be decomposed into a sum 
of M^ 1 matrices, each at higher temperature /3/P. Yet, 
contrary to the original approach [13-15], neither a Trot- 
ter decomposition nor a high-temperature expansion for 
Eq. (7) has been used, so far everything is exact for any 
P. Nevertheless, beside the aforementioned reduction 
from cubic to quadratic scaling no computational savings 
have been gained either. Quite the contrary, at first sight 
it might even appear that our scheme, which requires to 
invert P matrices, is less efficient than explicitly diago- 
nalizing H. However, as already mentioned, in the next 
section we are going to demonstrate that this can be cir- 
cumvented for the most part by expressing all but very 
few matrix inversions through a Chebychev polynomial 
expansion. 



THE HYBRID APPROACH 

In order to make further progress and to achieve an 
even more favorable scaling, we can either approximate 
the propagator e^P^ s ~ H ' of Eq. (7) or exploit the fact 
that by increasing P in Eq. (16) the matrix exponen- 
tial and hence Mf l can be ever simpler calculated. In 
an analysis of the Mi matrices it is found that every 
Mi matrix is always better conditioned than H , so that 
M^ 1 exhibits less nonzero entries and is therefore easier 
to compute than the inverse of H, which would corre- 
spond to the complexity of calculating p directly [16]. 



Furthermore, it has been observed that just a handful 
of matrices Mi, where I is small, are ill-conditioned and 
only for them the inversion is computationally cumber- 
some. All other Mi matrices having a higher index are 
very well-conditioned, so that the inversion can be effi- 
ciently performed by a Chebyshev polynomial expansion 
[16]. 

These complementary properties of the Mi matri- 
ces immediately suggest the following hybrid approach. 
Thereby an optimal I is chosen such that 1 < I < P, 
where all Mi matrices with I > I are inverted by a poly- 
nomial expansion and only otherwise for I < I by an iter- 
ative Newton-Schulz matrix inversion. As long as Mi is 
not ill-conditioned the former has the advantage of being 
independent of P, so that increasing P will not increase 
the computational cost. Together with the fact that the 
number of ill-conditioned Mi matrices does only depend 
on the particular system and j3, but again not on P, the 
present hybrid approach allows to employ an arbitrary 
large P at basically no additional computational cost. In 
this way the decomposition of the GCP in Eq. (9) can be 
made exact in any order essentially for free. 

However, it is possible to even more improve this 
method by recognizing that H is real and symmetric and 
that the equality 



Mi = M 



p-i+i 



(17) 



holds. As a consequence, only the real parts of the Mi 
matrices are required to compute p (see Appendix ), 
which entails substantial savings in terms of computa- 
tional cost and memory requirement. From this it follows 
that Eq. (16) can be further simplified to 
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In addition, it is possible to rewrite ReM[ 1 in the 
following way: 

Re A4f 1 = i (l + ( e ^(«-^) _ lJJV, -1 ) , (19) 

where TV/ is a real valued matrix as defined in Ap- 
pendix . That is to say, that the whole problem can 
be reduced to invert TV/. As already mentioned if Ni is 
well-conditioned, its inverse is expressed by a Chebyshev 
expansion. For this purpose let us rewrite Ni in terms of 
a shifted and scaled auxiliary matrix 
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whose spectrum lies between —1 and 1. The 
corresponding shifting and scaling parame- 
ters z = ( e ^x/2P + e e min /2P)/ 2 an d £ = 

(e £max / 2P — e £mi "/ 2P ) /2 are expressed in terms of 



5 



the maximal and minimal eigenvalues of H , i.e. by e max 
and e m i n [17]. Since for the latter only a crude estimate 
is required, they can be efficiently approximated using 
Gershgorin's circle theorem [22] as 



> max 



< min 




Therewith, for I > I, we can approximate 7V ; 
sum of Chebychev polynomials of X by 
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as a 



(22) 



where Tj are Chebyshev polynomials and c,; the corre- 
sponding coefficients. The upper bound m and thus the 
number of terms in the summation depends on the pre- 
defined target accuracy. 

After having computed the inverse of all the well- 
conditioned TV; matrices, we have to deal with the very 
few ill-conditioned ones. As already indicated this is ac- 
complished by the following Newton-Schulz iteration 



A k+1 =2A k - AkNtAk, fc = 0,l,. 



(23) 



which converges quadratically to N l 1 given that Aq is a 
good initial guess. Fortunately, we can make use of NF_ 

as an initial guess for Nj^} , n £ {0, 1, . . . , I — 1}. If 
this ansatz fails, it is always possible to choose 



A = jvf(||jv,||i||jv; 



{ CO I 



(24) 



which is guaranteed to converge [23]. However, as long as 
the former initial guess converges, it is computationally 
superior than Eq. (24) and typically requires rather few 
Newton-Schulz iterations to compute N[~ . A general- 
ization of Eq. (23) to calculate arbitrary p-th roots and 
its inverse will be published elsewhere [24]. 

However, the matrix-matrix multiplications of Eq. (23) 
causes that lirn/^oo A k+ i eventually becomes fairly oc- 
cupied. For this reason in order to sustain linear scaling 
the intermediate matrices are truncated. Nevertheless, as 
already mentioned, the condition number of Ni is always 
lower than the one of H and typically even rather well- 
conditioned, so that N^ 1 is by definition substantially 
sparser than p. From this it follows that the necessary 
truncation cutoff is relatively weak and the approxima- 
tion therefore very small, so that highly accurate linear 
scaling electronic structure calculations are still possible. 



PERFORMING SELF-CONSISTENT 



MOLECULAR DYNAMICS SIMULATIONS 

The chief advantage of this procedure is not only that 
it allows for accurate linear scaling calculations, but is 
furthermore also deterministic. Hence, at variance to the 
original approach [13-15], where the corresponding ma- 
trices are inverted by an approximate stochastic method, 
it is now possible to perform calculations using Hamilton 
operators of fully self-consistent mean-field theories, such 
as HF, DFT and TB. 

We have tested our new method in the context of elec- 
tronic structure based molecular dynamics (MD) using a 
self-consistent TB (SCTB) model [25] and implemented 
it in the CMPTool program package [26, 27]. In the self- 
consistent field (SCF) optimization loop self-consistency 
is realized by imposing local charge neutrality, to account 
for charge transfer processes, as well as bond breaking 
and formation. This means that the number of elec- 
trons of every atom a has to be equal to the number of 
its valence electrons within an adjustable tolerance, 
which we named Aq max . To that extend, during the 
SCF loop the diagonal elements of H are varied using 
a linear response function © until local charge neutral- 
ity is achieved. Specifically, in each MD step first H 
is built up, whereas in every SCF iteration we calculate 
the shift-vector Ah to the diagonal elements of H . The 
latter are the so called on-site energies = Ha, while 
the diagonal elements of pS represents the occupancy of 
the corresponding orbital, hence N e = Tr[pS]. Summing 
over all orbitals centered on any particular atom a, one 
obtains the associated on-site charge q a . Local charge 
neutrality is enforced by calculating A^ = ®(q^ — q a ) 
for every SCF iteration k and shifting the on-site energies 
using e^ +1 = ei + A H . So adapted, H k is diagonalized 
using the above formalism until max a \q a — q^\ < A(7 max . 
In that case, instead of being grand-canonical the simu- 
lation is performed at constant N e . 

However, as already recognized by Kress et al. [28] 
using the present SCTB model [25], the SCF cycle is very 
slowly converging and the number of necessary iterations 
critically depending on Aq max . Nevertheless, this can 
remedied by adapting the method of Kiihne et al. [1] 
in such a way that instead of a coupled electron-ion MD 
only the modified predictor-corrector integrator is used 
to propagate A// in time. In the framework of DFT this 
scheme has shown to be particularly effective for a large 
variety of different systems [29]. Inspired by the original 
scheme of Kolafa [30] here 
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is used as a modified predictor, where A//(i„) p is an 
estimate for Ajj(t n ) of the next MD time step t n and is 
approximated using the weighted shifts of the K previous 
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time steps. As we will show in Appendix the weights 

(_l)m+l m ^^. (26) 
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always add up to 1, so that K = 1 causes that Wi is 
identical to 1, i.e. A#(t„) p = A#(t„_i). 

However, contrary to the Car-Parrinello-like approach 
to Born-Oppenheimer MD of Kiihne et al. [1], where 
in each MD step only a single preconditioned electronic 
gradient calculation is required as the corrector, here the 
predicted A#(i„) p is only used as an initial guess for the 
SCF cycle, which requires at least a single if not multiple 
diagonalizations. That is to say that instead of a gen- 
uine Car-Parrinello-like dynamics [1, 31], a less efficient 
accelerated Born-Oppenheimer MD (BOMD) [32-37] is 
performed. 

Nevertheless, in this way the convergence rate of the 
SCF cycle is much increased, while at the same time even 
allowing for a rather tight tolerance threshold. In fact, 
comparing with the employed convergence criterion of 
Kress et al. [28], here Ag max can be chosen to be at least 
one to two orders of magnitude smaller without requiring 
numerous SCF iterations. 

Due to the fact that the present scheme is equivalent 
to diagonalizing H, as for any SCF theory based BOMD 
simulation, the interatomic forces thus calculated are af- 
fected by a statistical noise Ej , except for the unrealistic 
case that Aq max = 0. Hence, instead of the exact forces 
Fj, merely an approximation iT'BOMD _ Fj + Sf is com- 
puted, where F? OMD are the BOMD forces calculated 
by an arbitrary SCF based theory. Even though, E^ 
can, to a very good approximation, be assumed as white 
[14], the line integral defining the net work is always pos- 
itive and thus the energy drift of a microcanonical MD 
simulation. While the noise may be tiny and the forces 
highly accurate, as far as static calculations such as ge- 
ometry optimization are concerned, the resulting energy 
drift is way more critical. An energy drift of as small 
as 1 /ieV/(atomxps) grows to an aberration of 10 K/ns 
and may cause that liquid water for instance evaporates 
within a couple of nanoseconds simply because of the 
energy drift immanently present in any BOMD simula- 
tion [38]. Therefore, at least in principle, it is no longer 
guaranteed that by solving Newton's equation of motion 
(EOM) the correct Boltzmann averages are obtained. 

Fortunately, only based on the assumption that E^ is 
unbiased, this can be rigorously corrected by devising the 
following modified Langevin equation [1, 14] 



M r Ri =F r + E, - inMiRi 
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where Afj is the nuclear mass and 7jv a friction coefficient 
to compensate for the noise Ej . The latter has to obey 



F 7 (0)Ef(t))=0, 



(28) 



as well as the so called fluctuation-dissipation theorem 

(Sf(O)Ef(i)) =2 lN k B TM I 8{t). (29) 

If we would know 7jv such that Eq. (29) is satisfied, a 
genuine Langevin equation is recovered, which guaran- 
tees for an accurate canonical sampling of the Boltzmann 
distribution. However, at first sight this may look like an 
impossible undertaking, since we neither know Fj, nor 
E^ from which 7^ can be deduced. Nevertheless, it is 
possible, even without knowing E^ except that it is ap- 
proximately unbiased, to determine 7^ directly simply 
by varying it in such a way that the equipartition theo- 
rem (hMiR 2 ^ = |/s_bT holds. Once 7jv is determined, 

it must be kept constant for the whole simulation. But 
then it is possible to exactly and very efficiently calculate 
static and even dynamic observables without knowing F \ , 
but just fP° md . Due to the fact that the same also 
holds for noise introduced by truncating the intermedi- 
ate matrices of Eq. (22), as well as using finite-precision 
arithmetic and a non- vanishing integration time step, the 
corresponding noise terms simply add to E^ . 



LIQUID METHANE AT EXTREME PRESSURE 
AND TEMPERATURE 



For the purpose of demonstrating our new method, we 
present here initial results on liquid CH4 at high pres- 
sure and temperature. All BOMD simulations have been 
performed using the SCTB model for hydrocarbons [25] 
as implemented in the CMPTool code [27]. The calcula- 
tions have been performed in the canonical ensemble at 
T = 2000 K and volume V = 10.04 cm 3 /mol, which cor- 
responds to the second-shock at pressure P = 92 GPa of 
a two-stage light-gun shock compression experiment [39] . 
The EOM of Eq. (27b) is integrated using a discretized 
time step of At = 0.5 fs. We are considering here a peri- 
odic cubic box of length L\ = 12.775 A, consisting of 125 
liquid CH4 molecules as our unit cell. The local charge 
neutrality threshold of the SCF loop A^max = 0.05. Since 
we are dealing with a large disordered system at finite 
temperature, the Brillouin zone is sampled at the r-point 
only. 

To assess the accuracy we study a sample comprising 
of 1000 CH4 molecules (2x2x2 the size of our unit cell) 
at T = 2000 K and compare the partial pair-correlation 
functions, as obtained by our novel scheme, with the re- 
sults of Kress et al. [28] using exactly the same model 
[25]. As can be seen in Fig. 1 the agreement is excel- 
lent. Further calculations to investigate the dissociation 
of methane at even higher temperature and its implica- 
tion for giant gas planets such as Uranus and Neptune 
will be discussed in a forthcoming paper [40]. 

In order to sustain linear scaling in terms of compu- 
tational cost and at the same time memory requirement, 
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Figure 1. (Color online) Partial pair-correlation functions g(r) 
of liquid CH 4 at 2000 K. 
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Figure 2. (Color online) The average walltime for a single 
SCTB MD step versus the number of atoms on a single core 
of a 2.40 GHz Intel Westmere processor. 



all sparse matrices are stored in the common Compressed 
Row Storage format. Due to the fact that our algorithm 
heavily relies on the multiplication of sparse matrices, 
we have put particular emphasis on an efficient parallel 
implementation. In that the data is distributed to the 
individual processor cores by employing a space-filling 
Hilbert curve to keep the load balanced [41]. While for 
solid state systems a very good scalability has been ob- 
served, for disordered liquids studied here the situation 
is substantially less favorable. A more efficient scheme, 
which dynamically rearranges the matrices between the 
various processor cores, or even distributes them fully at 
random is current work in progress. 

Nevertheless, to demonstrate that linear system size 
scaling is indeed attained, in Fig. 2 the average runtime 
for a complete SCTB MD step at T = 2000 K is shown for 
various system sizes using a single core of a 2.40 GHz In- 
tel Westmere processor. Specifically, we have considered 
five different systems, beginning with our unit cell up to 
5x5x5 replications of it, which corresponds to 625, 5000, 
16875, 40000 and 78125 atoms, respectively. As can be 
seen in Fig. 2, for small system sizes up to around 10000 
atoms the scaling is even sub-linear and thereafter per- 
fectly linear. 

However, beside the formal scaling with system size, 
the corresponding prefactor is also rather important and 
depends on Ae, the spectral width of H in unit of ksT. 
In the case of Chebychev polynomial based Fermi opera- 
tor expansion methods, the computational cost has been 
found to scale like A/3Ae, in order to achieve an accuracy 
of 10~ A [42]. Apparently, this entails a fairly large pref- 
actor if either high accuracy is required, the electronic 
temperature is low, or when Ae is large. The latter is 
typically the case for an all-electron calculation, or if a 
plane wave basis set is employed. Nevertheless, the usage 
of fast polynomial summation methods leads to the more 



favorable scaling \//3Ae [43, 44]. For the present hybrid 
approach this results in an even better sub-linear scal- 
ing of \fjJKe [17], which makes it particularly attractive 
for highly accurate all-electron ab-initio calculations, or 
when a high energy resolution is required. Together with 
the methods proposed by Lin et al., this is the best scaling 
with respect to (3 and Ae reported so far [45, 46]. Based 
on a multipole representation of the Fermi operator, the 
latter scales as ln(/3Ae) ln(ln(/3Ae)), which depending on 
the actual value of /3Ae is either slightly lower or larger 
than our cubic root scaling. 



CONCLUSION 

In conclusion, we would like to mention, that the here 
presented method can be directly applied to fully self- 
consistent DFT calculations by writing Vdc of Eq. (1) as 



V dc [p(r) 



dr / dr 



,p(r)p(r') 



dr p(r) 



60* 



xc 



Sp(r) 



xc 



E 



ii , 



(30) 



where the first term on the right hand side is the double 
counting correction of the Hartree energy, while fixe is 
the finite-temperature exchange and correlation grand- 
canonical functional and En the nuclear Coulomb inter- 
action. Except for the latter term, Eq. (30) accounts for 
the difference between the GCP for independent fermions 
fi and the GCP for the interacting spin-i Fermi gas 



fiint [p(r)] = - - In det f 1 + e ^ s - H 



2. J r-r' 



dr P( r )7T7TT + ^XC- 



5p{r) 



(31) 
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As before, in the low-temperature limit Vti nt [p{r)] + fiN e 
equals to the band-structure energy, whereas fixe cor- 
responds to the familiar exchange and correlation en- 
ergy, so that in this limit T = SI + fiN e + Vdc = 
^int[p(f)]+iJ,N e + Ej] is equivalent to the Harris- Foulkes 
energy functional [47]. Such as the latter, T is explicitly 
defined for any p[r) and obeys exactly the same station- 
ary point as the finite-temperature of Mermin [48] . 



The formal analogy of our decomposition to the Trot- 
ter factorization immediately suggests the possibility to 
apply some of the here presented ideas with benefit to nu- 
merical path-integral calculations [49] . The same applies 
for a related area where these methods are extensively 
used, the lattice gauge theory to quantum chromody- 
namics [50], whose action is rather similar to the one of 
Eq. (10). 
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A" 
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m—\ 
K 

= £(- 

m— 1 



All but the first summand cancels out so that we get 

m— 1 \ / \ / 

which after inserting it into Eq. (33) equals to 

m=l \ / \ / 
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Proof that M t *M t 6E DxD 

Using equation (17) and the fact that loi := e^ 2 ' -1 ) 
denotes a point on the unit circle of the complex plane 

M* l M l = (l - une&W-Dy (i _ Wje £(MS-H)) 

= 1 - (uj l + UJl )e^ s -V + (uJm)e^ s -V 
= 1 - 2 Re ^ e M^-H) + J^S-H) 

= (l + e^ H -^ - 2Recu l e ^ H ~^ e^ 3 ^ 
~ Nie ^S-H) £]r dxd (32) 

where D is the number of basis functions and therefore 
the dimension of the real matrix Af z * Mi . 

Proof that J2m=i Wm = 1 
For the purpose to show that 

K K I IK \ 

Y^w m =J2(-^ m+1 mjmr = l, (33) 

m=l m=l \K-l) 

we make use of the Appendix of Ref. (30) and write 



(34a) 



(34b) 



(34c) 
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